Posts

Showing posts from 2021

Data Scientist's Self Defeating Prophecy

Image
Jonah, a biblical prophet, was tasked with delivering Gods prophecy to the people of Ninevah, an ancient Mesopotamian city. He was sent out to Ninevah to warn the city's inhabitants of divine wrath to come as a result of their sins. After an eventful journey, John reaches Ninevah and warns the citizens of the impending doom. Citizens of Ninevah repent their actions and pray every day. Jonah waits outside the city for God to come and punish its citizens. However, God does not show up because the citizen changed their ways for the better. Because the prophecy revealed early and people intervened, it did not come true. Such prophesies are called self-defeating prophecies. In such cases, determining whether the prophecy would have come true is difficult, and the foreteller's credibility may be called into question. Modern-day digital soothsayers or data scientists face this challenge routinely. We are frequently expected to predict an undesirable outcome and prevent it from occurri

Cartoon Corner

Image
    Defining success on your own terms Stay in your rabbit hole

Delude of Accuracy

Image
  Accuracy is one of the metrics used to evaluate a predictive model. This word is also an every-day English word. How this word is interpreted/communicated in meetings between business users and ML engineers can have significant implications. While there are many other metrics such as precision, recall, F1 score, and so on[ 1 ], most business users relate to accuracy. Many a time, this metric, accuracy can be misleading. Decisions based on an on-the-surface evaluation of any single model metric can result in losses.   Let me clarify what I mean. A bank wants to predict who is likely to default on a loan and decide if it should disburse the loan or not. Now, what is an acceptable accuracy for the predictive model? Can I use a model that has 20% accuracy? Is 90% good enough to put my models in production? Well, it depends. Is 95 % accuracy good enough? Accuracy is the number of records that are correctly classified. Let us say, we have 95% accuracy. The bank sees about two defaulters fo

When not to use Artificial Intelligence

Image
  Artificial intelligence is at the peak of the Gartner hype cycle[ 1 ]. It can pressure teams to take up unreasonable projects. One needs to perform his/her due diligence before embarking on a machine learning use case. Alternative solutions can work out to be cheaper and more efficient.   Projects with no timely action A Machine Learning(ML) project may be brilliant in itself. But unless the business can act upon the results, the model is of no use. Not only should we be able to take action we should also be able to take this action promptly. Here are a couple of examples. Employee attrition : Consider a team working on a model to forecast employee turnover. The data science team goes to great lengths to collect data from several corners, including social media, browsing habits, and so on. They develop a robust black-box model using this hard-won data to determine whether an employee would quit in the next 30 days. When the HR gets these results, he/she does not know what to do. Unle