Global Significant Earthquakes

Global Significant Earthquake database by National Oceanic and Atmospheric Administration compiled the dataset of significant earthquakes that rocked the world since 2150 BCE to present. Pacific ring of fire has more pronounced tsunamis triggered by earthquakes. The story is here.



Posted in Visualization | Tagged , | Leave a comment

World Satellites Exploration

UCS Satellite Database has the list of satellites orbiting earth. As of 1st of June 2017, we have 1459 satellites orbiting earth.  Country of origin indicates the country that is registered as responsible for the satellite in the UN Register of Space Objects. “NR” indicates that the satellite has never been registered with the United Nations. Operator country is the one that operates or owns the satellite. Contractor country represents the country or business entities responsible for the satellite’s construction. United States leads the race in every single category. India picking up pace with 97 satellites launched from Satish Dhawan Space Center and with construction of 44 satellites. Atlas V that launched most satellites was formerly operated by Lockheed Martin, and is now by the Lockheed Martin-Boeing. Most number of satellites are launched in the past five years and are used for commercial communications. Explore more details here.

World Satellites

World Satellites

Posted in Visualization | Tagged | Leave a comment

Primum Non Nocere

I want to discuss one of the lesser known study in the world. The study is named “Mushroom Trial” that was spearheaded by my loving mother and the subjects were my immediate family. As much as she loves to try new recipes, in the late 90’s she found a new “vegetable” to cook with commonly called as mushrooms. We were all thrilled to try new stuff although I voiced my concern “but that is fungus”. All of us enjoyed the meal and I didn’t have my usual portion because of my fungus apprehensions.  20 minutes later, I had intense stomach cramps and started throwing up. My mom was annoyed by the unwarranted trip to ER for non-stop projectile vomiting on something as innocent and pure as farm-fresh, organic ingredients, and love. She assumed that I was jumping around and climbing trees in the backyard after my meal. The next time she cooked mushrooms, she asked me to have a generous helping. This time results were more ominous – started vomiting bile with pronounced rashes. This time, I wasn’t jumping around and in the ER she assumed the kind of mushrooms she cooked could be the reason. Not wanting to upset my mother, I grew numb of apprehension while increasing wary of physical distress. Eventually the doctor in ER ended the mushroom trial by “can we just say she is allergic to all kinds of mushrooms and save everyone time?” Overall, her trial went like this –

Mushroom Study

Mushroom Study

Now, let us analyze the data from this study:

Informed Consent

The most essential part of data collection or study is informed consent. Institutional Review Board (IRB) requires informed consent in research that involves human subjects unless you get a waiver or alteration due to the sensitive nature of the study. There are two parts here “inform” and “consent”. You got to inform the human subject and get their consent. In this trial, mom did inform everyone about mushroom but not necessarily the consent. That brings another interesting question, although it was one study but there were individually 5 different experiments. So, do we need informed consent at the beginning or do we need to keep it going as we change the parameters of the study? Informed consent is a more than a form or paper, it is a process that we should have the respect for individuals. Any research is voluntary.

Data Inputs

There are several issues with the data inputs. It was a poor selection of 5 people in a big family tree and it is also biased for the immediate family. More than this study also promotes a historical bias “mom knows better”. My mom never intentionally tried to harm me. Towards the end, I did raise an objection that it makes me sick. Unlike United States, food allergies are not that prevalent in India to a point that even the doctor took time to come up to the conclusion. This dataset is also incomplete as in I was not sure if I had the same “but that is fungus” apprehension after the first time. I just went with my mom trying not to upset her. In a way, it is incorrect to conclude “allergic to all mushrooms”. The data is outdated as in this dataset is from my childhood and some allergies can fade as the kid grows.

Algorithmic Bias

The problem with this study is assuming correlation implies causation. It is known that not to engage in strenuous physical activity after meals and not all kinds in the same food group could cause adverse reaction. These two correlations were completely wrong about the true cause. When personalization or recommendation algorithms are built on incorrect assumptions and bad data inputs, it could skew the possibilities of expansion and could create a tunnel vision.

Data Privacy

Aside from talking connections is philosophical way, we are connected more than ever with Internet of Things. In this connected world and the information is literally at your fingertips, what has become of privacy? I chose to expose this study and dataset in my website – that exposed my immediate family. My family is very private. I have the most social media presence. I made the dataset public. Have I violated their privacy concerns? Not exactly by the word of law. Are there ethical violations? There is a very little chance that my parents read this and get upset about it. There is a possibility that my siblings could read (usually I’ve ask them to read), if they read and raise concerns, what are the possible actions? Should I say “my blog, my stories” or “offer them some form of compensation” or “take down this post”.

Technological advancements, such as big data opens doors to endless possibilities everywhere. As always, with great power comes great responsibilities. Data laws are still in the primary stages of evolution. Ethics can get polarizing and controversial as with any issue in this country. Study after study, I read shows how we are chartering in unexplored waters when the real world get increasingly complex. That reminds me the principal percept in medicine and bioethics – “First, do no harm!”. Data world could start from there too. I want to share my musings on the one area where should keep our focus on – ethics – as our digital universe explodes.

Posted in Big Data, Conceptual, ethics | Tagged , , , | Leave a comment

SEPTA Regional Rail OTP 2016

Exploring SEPTA Regional Rail on time performance in 2016 from the dataset here. Overall, the busy days are the weekdays rather than the weekends. It is also noted that the trains originate from south faces more delays than the north. The heatmap of delays brought an interesting red that made google what happened on the first week of November 2016? SEPTA strike! Here is the story



Posted in Visualization | Tagged , , , | Leave a comment

Philadelphia Crime Story

OpenDataPhilly has the datasets of crimes in Philadelphia from 2006 to recent. Here is the exploration of crimes in Philadelphia. In general, there is downward trend in overall crime rates. There seems to be seasonal peaks and declines – for example crimes seems trending low during winter and there is a surge in summer.Here is the story… 

Posted in Visualization | Tagged , , , | Leave a comment

Hadoop Ecosystem – A Quick Glance

What do Pig, Kangaroo, Eagle, and Phoenix have in common? Hadoop! We got some interesting technologies with curious names in Hadoop ecosystem. Azkaban is bloody wicked. H20 and Sparkling Water compete in the same space. Rethink, Couch, Dynamo, and Gemfire would let you think you just got out positive affirmations seminar. Leaving the bad jokes aside, Hadoop Ecosystem has been growing. Here is a quick glance with my little tweaks –

Posted in Hadoop | Tagged | Leave a comment


Trying to pick out all my data science presentation and consolidate here.

Posted in Big Data | Leave a comment